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ABSTRACT 

"Linear threshold element" is the generic term for a device which 
forms the sum ap + ajX) + aoX_ + --- + agXg from an input vector (x, » Xo > 
+«+, Xq) and yields one of two outputs depending on whether or not the 
sum is poSitive. A pattern classification machine may utilize a linear 
threshold element along with a controller which receives the one of the 
two values corresponding to correct classification of the input vector. 
The purpose of the controller is to modify the gain vector (ap>» Ays see, 
aq) so that the next input vector has a greater likelihood of being cor- 
rectly classified by the threshold element. 

This likelihood depends on the value of the gain vector and an 
adaptive algorithm of the "steepest descent" variety can be used to 
attempt to adjust the gain vector to its optimal value as the machine is 
exposed to a stationary sequence of statistically independent input vec-~ 
tors. The components of these vectors are commonly two valued, and it 
has been shown that convergence of the expected value of the gain vector 
is dependent on the value of the adjustment parameter, the values of the 
components, and the distribution of the input vectors. It is shown herein 
that a bound on the adjustment parameter, simply related to the values of 
the input components, is sufficient to insure this convergence. The var- 
iance of the gain vector is derived under the assumptions of a uniform 
input sequence and oppositely signed components of equal magnitude and 
it is shown that a similar bound on the adjustment parameter implies con- 


vergence of the variance. The variance is graphed under representative 


conditions. 
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1. Introduction. 

"Linear threshold element" (LTE) is the generic term for a device 
which forms the sum aj + a,x, + a5X5 + --- + agxq from an input vector 
(x15 X92 Xg2 se% » Xq) and yields one of two outputs depending on whether - 
or not the sum is positive. The components of the input vectors are com- 
monly two valued, and therefore the total number of possible input vec- 
tors 15S 27. When used in a pattern classification machine (PCM) the 
output of the LTE classifies each input into one of two classifications. 
This classification may or may not be correct. The correct classifica- 
tion is given by the environment, a fixed but unknown function defined 
on the same set of input vectors, whose output is either of the two 
values of the LTE. 

As an example of an environment, consider a handwritten letter of 
the alphabet which is "read" by an array of mark sensing devices. The 
input to the environment is the pattern from the sensing devices and 
the output is whether or not the letter which was read is a particular 
letter. Consider also medical diagnosis. The electrocardiagraph of a 
particular human heart may be sensed as above and the output of the en- 
vironment is whether or not a particular anomaly is present. In either 
case if the same sensing pattern is presented to an LTE, it will also 
make a classification. The number of similar applications is large. 

In addition to the LTE, a PCM may use a controller. This device 
receives the environmental response to an input vector and attempts to 
modify the gain vector (a> Ay» An» -e- a4) so that the next input 
vector has a greater likelihood of being correctly classified by the 
LTE. Several algorithms have been proposed to adjust the gain vector.| 1] 


The method under consideration is the steepest descent algorithm of 


Widrow and Hof€. Poe It will be discussed in section 2. 

The gain vector is given an initial setting a(1), and then the 
PCM is exposed to a potentially infinite sequence of input vectors 
{ x(n)} and the corresponding sequence of correct output values from 
the environment {tl x(n)]}}, n= 1,2,3,..-. . After each input vector is 
presented, the gain vector is adjusted by the controller to yield the 
sequence {a(n)} . Under certain conditions, this sequence will con- 
verge to a terminal vector a* which is optimal in some sense for the 
correct classification of an input vector by the LTE. 

It is assumed that the sequence of input vectors is a strictly 
stationary stochastic sequence of independent random variables, ie. 
X(n) = X, where X is a random variable whose statistical properties are 
completely described by the probability vector P = ( Pig Pos seees Pod)» 
and Pj =Pr{x = xJ}> 0 20 ee ee oa 24 and 2 Pi = 1. A fre- 
quent example is the uniform input sequence: Pj = 9-4 for all j, which 
implies that the occurrence of each of the 2d input vectors is equally 
likely. 

The LTE is used to predict the environment, and a measure of its 
ability to perform this task is the state of the PCM, S(a), defined as 
the expected value of the squared difference between the responses of 
the LTE and the environment with respect to the random variable X. The 
State of the PQ is zero if and only if the responses of the LTE and 
the environment are equal for all input vectors. The task.of the con- 
troller is to minimize S with respect to the gain vector. 

Due to the discontinuity of the step function, the minimization of 
S will not be without difficulty. Consider an auxilliary measure of the 


performance of the PCM, Q(a), the expected value of the squared differ- 


ence between the response of the environment and the sum a, + a,x, + 


0 | 
see + aGXa which is internally generated by the LTE. This auxilliary 
measure is the one chosen for minimization and it is believed that the 
following theorem is correct, although a satisfactory proof is not’ 
known to exist. 

Theorem 1. If Q(a*) S Q(a) for all gain vectors a, 

then S(a*) < S(a) for all gain vectors a. 
To Summarize the assumptions and notation, let the possible values 


of the components of the input vectors be q and r, then the LTE, g, is 


defined on the set 
Bea hk) oles XQ» Xpo +++ Oe Xp = max[|q| » fifi 3 
; F d 
x, 61 q; Cie = 1,2, sedis ©, 25 ..ag2eeee. 


Let ad = fa: ac (a9 » ay: Seiad ‘ be the set of gain vectors. 


Then g(X) = sgn( alx), where sgn(t) =f aaa toe 


and R ={1, -1} is the range set of g. The environment, f, also maps 
Bd 


into R. Note that each environment could be interpreted as one of 


d 
the 27. Boolean functions of d binary variables. ‘The measures are as 


follows, 
Ere, CUTE 2 
S(a) = [ £(X) - g(X)] —" 
Reon. eyo 
O(a) =[ f(y “a I? . 


X is a random variable which takes on values in i, all with a positive 
probability, in accordance with the probability vector P. Note that 


all functions of X are therefore random variables. 


2. The Adaptive Algorithm 


The problem is to minimize 


= eee SS SS SS Ce d SE 
Q(a) =[ £(X) - ax]* = £°(Q) - 27 a £(X)x, 


dd 
aD, Ey ee oi eae 
Palo ceo Ke 


Setting 2 (a) = 0, Vi, yields 
a. 
sl 





d 
EC)x; ae y xx oe vei (1) 


0 re 
That value of the gain vector which satisfies equation (1) is the 
vector a*, which minimizes Q(a). But since f is unknown, the equation 
cannot be solved directly. 
Using the method of steepest descent, the gain vector is modified 


after each presentation of an input vector: 


a(n+1) = a(n) +7 gradQ_; 


where a(n) is the value of the gain vector at the 


time of the nt® presentation; 


Q,2 {ff xm] - al (n)X(n) } # 


X(n) is the nth 


9 Qn 90, 90n 


input vector; 


YT is a positive constant. 
Each component is adjusted in the direction of decreasing Q.. 
Specifically, 


a;(nt+1) = a,(n) + 2Nx;(n) { f[X(n)] - al (n)X(n) } a oe C2 


It is to be noted that a(n) is a random variable. Now show that atn) 
converges to a*. Martinez shows this as follows;|[ 4] 
Rewrite equation (2) as 
a(n+l) = a(n) + B[ bd, - oF a(n) | (3) 
where the adjustment parameterB = 2n, b, = f[ X(n)]X(n), 


be vk a al 
and Co = X(n)X-(n). Note that Cy = Cy- 


Hence a(n+l) =8 ba +[ I - Ch Jat) 


d, + E,a(n), (4) 
where d,, = Bb» and = ea 


Expanding recursively and taking expected values, 





a(n) = d-1 aa Ei 9 + a | ees so @ 
+ BE oy + -Byacd)- (5) 


~ 


Due to the assumptions of independence and stationarity on the 


input sequence, equation (5) becomes 





2 


-2 rs 
[1 + E+ EO + .26 + ES ] D + oe a(l1), 


a(n) 


where E = E, and D= d_, 
n 


[ I - E] =: [I - nai D + ciaeweiO (6) 


If tim E™ = 0, then 


no? 


Lim'a(ny =[ I - E] sii, 


NO 
It is then shown that if C = C, is positive definite, and if 8 < 


=? where 4 is the largest eigenvalue of C, thentim E” = 0. Hence, 
no 


if C is positive définite, and the positive constant Tiis less than 


the reciprocal of the largest eigenvalue of C, then the modification 


given by equation (2) will cause a(n) to converge to a*, since 


[ I - E ] “lp = ob. where b = di» and if a' is the terminal value of 
a(n), then b = Ca‘, which is the minimizing equation (1). 

Under what conditions is C positive definite? Consider cd = xJxJ5T 
and let Z be a non zero vector. Then z!cJz = z@xJxJTz = (zlxj)2 = 0, 


which shows that oJ is positive semi definite for all j. 


C= Cac & pc”, and if Z is as above, then 
j 


2 . 
Aa oy > p42 cz = 0 
j 


Therefore C is always positive semi’definite. That it is, in fact, 
always positive definite is shown by contradiction. Assumed W 3 


wicw = 0. 


Then p,W'c!w = 0, for all j. Since p; > 0 for all j, 


wicJw = 0 for all j, which implies that 

(wixJ) 2 = 0 for all j, and 

wixJ = 0 for all ie (79 
Equation (7) is a linear system of 2d equations, and if the vector W is 


a solution, then it must satisfy a subsystem of d+l of the equations. 


oie 2 Dey 
eter, X seca X i be a set of linearly independent vectors from 
po. Such a set exists since pd Spans d+l space. Hence 
bab b 
2 
nl! Ge OR Ge cue Dae (8) 


Now a nontrivial solution to 8) exists if and only if 


b b b 
eZ 
Xe = 0. 


10 


But this set of vectors is linearly independent and their determinant 
is non zero. Hence for all non zero vectors W, Wow # QO, and C is 
always positive definite. 

What is the range of the eigenvalues of C? For each cj, diag(cJ) = 


Cy. Go) =, <6 ie x), and it follows that diag(C) = ( Co? 2,9 22-2 


Qq)> where d 


eae. 
Qo =r and 
qz< e, < r? forei = loa2jen. sah 
Hence trace” (G@) = d e. < (d+1)r2. 
i=0 
Let hoor prrteag be the eigenvalues of C. Then, since trace (G) = 
sz a 
> ij and i O Eom all it, 
i=0 
d 2 
A= 3 xX. < Ctlyr . 
: iL 
1 = 0 
, j 2 
It then follows that if the adjustment parameter Bs =e then 


(d+tl)r 
a(n) will converge to a*, since C is always positve definite and 


2 
Rg s ap, > B< 
(d+l)r 


2 
7 : 
The remainder of this section examines the convergence mw aaoindce 
the following assumptions. 
(1) The components of the input vectors are of equal magnitude 


and oppositely signed, ie. O<-q =r. 


-d 
(2) The input sequence is uniform, ie. Py = 2 for all j. 


ll 


With these assumptions C will be shown to reduce to aie which simpli- 
fies further analysis without sacrificing a great deal of applicability, 
Since any PCM can be reworked to make the transformation of (1) above, 
and (2) above is a common ad hoc approach to a practical situation. 
Consider the following constructive scheme for the input vectors 


which is analogous to the binary representation of the integers. 


xl = ( Ty, =-T, =-T, eee 9 a -T, -f ) 

x2 = ( tT; — hs =F» eee 9g ~T 5 ~T, r ) 
T 

x3 = ( | -T, ~C, eoeoerg =-T, CT, -T ) 
4 1 

X = ( | a —-T 5 ~Ts eceory WX 9 Ts, r ) 
Z T 

i = Cr, ©, Fs «. ter, XY ) 


This scheme could also be displayed as the matrix M = e with each 


j 
pia 
element of the form ——~— where (contrary to the usual convention) 


r 
1 is the column index - which refers to the components of a particular 
input vector - and j is the row index - which refers to a particular 


vector from poe The order of M is 24 (4 + 1). An analytic expression 


for m.. is 
Oo ij 


Please @eymeun < goat 


1, otherwise. 
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col col col col col col col 


0 1 2 d-3. dag. ae d 
hil il ie ee eS a row 20 

1 sii «leas =i =| 1 Bn) oe 

PS ee 1 1 row 21 +1 

1 -1 Be 6 el aT 1 7 row 22 

1 zal 1 Saas Fil =a SIL row 22 + 1 
M = ¢ s @ ° e ® td 

il | as ile. | T 11 iL row ein 

1 -1 . dete OL -] -| =~] row 9d-2 + J] 

1 = a | i i) 1 row 5 ae 

1 1 . ae ll -] -1 -1 7 row ii +] 

1 TI ee 1 1 i row 2° 


Now it is necessary to show that the column vectors of M are 
mutually orthogonal. Using the method of induction of the dimension of 


the PCM, let d = 1. Then 


1 -1 Tl 
My = : = (Np N,) where No 1s the column veetor| 1} ana 


-1 
and Ny is the column vector| i} There is only one pair of vectors to 


sb 
check. NoN, = -1 + 1 = 0. The vectors are orthogonal. Now let d = 4 


13 


; wih 
and My = ( CoC Co aaah Cy) where oF I 1s the 1-— column vector of My - The 


induction assumption is that the column vectors of M, are mutually ortho- 


gonal. Then if K; is the it® column vector for the dimension2+ 1, 


My. 1a (KK Kj e+ eK 1? which can be partitioned 


with respect to rows as 


Co - C4C 4 Co Sor Cy 


With this partitioning any column vector of Mod can be expressed as a 


direct sum (a physical concatenation) of the column vectors of M, >» to wit: 


gS.) Ky = Co + Cy» K, = (-Co) + Co? and K, = Cc. 4+ C4 for 1 = 2emese 


cE 
T+ le then any pruodmet of the Eoru KOK, for i= 2, 3, ..., +l ao 


be expressed as 


: T : ae qT z _ 
Se. = Clon Gua cece) Gueo eeee'c +c ca 
O11 0 0 O O O-O O-O : 


Similarly, any product of the form KK; FOL = 25.35... , 1blecan@be 


° gl ° 
expressed as ((-Co) + Cy ) € Cj_y + C;_4) 


_ iy 1p 2 sh 
= eSCgeney CoS3 4 = 0, and any product of the form eS for 


1< i< j < #4 +1 can be expressed as 


® all ® 
Ca Ome eee eee 


T db 4 
Sesh ay ei Al 


? 
2034054 


O, since all C.'s are orthogonal. 


14 


Thus the column vectors of M 41 are mutually orthogonal, and by induction 
it is true that for all positive integral values of d, the columns of the 


matrix M are mutually orthogonal. 





d 
. 4 k -d 
Now consider the elements of C =} p,.C where P, = 2 , andthe 
k =l 
input components are 7 r. 
d 
d 
2 2 
= k -d k_kT 
oe = by °- — 
oar Z Bay Ci; 2 21 (ex co 
d 
= 2ae ; a 
" & ey *E5 
i 24 ” 
~ y. , 
= 2 we onswt.~, singe m.. = 2 
k=, LK? jk 1Jj = 
This reduces to c.. = SS, where 6.. is the Kronecker delta, 
1j Lj EJ 
7d 
since m.m, = Oif i # j, due to the orthogonality of the 
yey, 1k jk 


column vectors of the matrix M. Hence C = r°t. 


Under these assumptions equation (6) becomes 





~1 n-1 n-1l 
a(n) =[1 -~ E] fL-& *~§ Or Ee “a 


= -l E 
=[8C] LY = (© 1 aie). | 0 Re [rt -8cJ ™*ac1) 


Z -1 a 
[ert ] TU t-<i-@e >” Feels -6r°r]” haa) 


a [ ia « bee 2) "1 Jew + (1-8 Po ey) 


=F [ l1- (1 - Br2)P-ty Ca*¥ + ( 1 ~§ r2y77* aca) 
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-1 


Tete oil ae ar) aC 1) (9) 


ae eat — aGgibmiec | —6 r? — : (10) 


Equation (1) can be written as b = Ca*, which becomes b = ra*, and 


the optimal value of the gain vector, a* = _D - Then 





r2 
_——— ae d d 
Q(a*) = £2 (X) - 2° art (X) x» +) 2 agak % x, 
4,=0 4 =0 k=0 
d 9 d d 2 
= 1-25 axr°ak¥ + EY U axkark r 
rage *— geeteg © & tk 
= 1 - r“(a*)? (11) 


From equation (11) it is evident that if Q(a*), the optimum of the 
minimization effort, is near zero, then (ax) 2 is near oe. Note also the 


range of (a*)*, between zero and rv? 


16 


3. The Variance of the Gain Vector. 0 
+ ol 





Consider the variance of a(n), vi a(n)] = a*(n) - a(n) . It is 


necesSary to compute a(n), which will be done under assumption (1), 


the components are a r. From equation (4), 


a(n+l) = die Ban) » 


where dq, = 8 bo = faq)! x(n), 


I -8C =8 X(n) xX (n) A 


——— 
n 
and let O< B< 2 
(d+1)r2 
2 ~ ee 1 t 
Then a“(n+l) = [a + a (n)E_J[ d, + E a(n)] 


2 T T th 
qd + 2d Ean) ral (n)E,E a(n), 


which can be written as 


a’ (n+1) = 25" (a+1) +2[ 146 r-(a+1)] da(n) 


it 
# a (n) {1 + 6[8r7(d4+1) - 2] Cc, Jatn), (12) 
since a* = 8 bo 8 bes =6 “i: [ x(n)] xT (ny tf X(n) ] X(n) 
= 8*xT (n)x(n) 


=9 *r7(a+1), 
ame Gow et vet [I -8Cc, J] = eel xm] xm) [ rt -8x(n)x! (n) J 
non n n 


=8 £[ x(n) J[x'(m) -g r2(at1)x'(M) J 


jl 


[1 -8r*(d+1) J a’, 


in is 
cc: , 


n n 


and EnE, I - 28C, +8 


I - 45) + Br 2(a41)C.. 


1 +9[8r(d+l) - 2]c. 


Taking the expected value of equation (12), 


a*(n+1) = g *r*(d+1) + 2[ 1 -8 r*(d+1)] d (a(n) 





+ a*(n) + g[8r-(d+1) - 2] a*(n)C_a(n). (13) 


Before considering the expected values of d?a(n) and a" (n)C_a(n), 


recall the following theorem, as stated by Halmos.| 6 ] 
If { figit = eee. 5 k: “jee 2,++-,n; } is a set of inde- 
pendent functions, ifm; is a real valual, Borel measurable function of 


n; real variables, i = 1, 2, ..., k, and if £ (x) =; (f£,, (x), ete 5 in 


(x)), then the functions fi, +--+, f, are independent. 

Now consider the set of independent random variables {X(1), X(2),..., 
X(n-1), X(n)} . By definition d_ = B£[ X(n)) x(n) and is defined on the 
subset { X(n)} . On the other hand a(n) is defined on the complementary 
subset { X(1), ..., X(n-1)} by the recursion relation of equation (2). 
Therefore, applying the theorem to each component of these vectors, the 
conclusion is that a(n) and d, are independent random variables. This 
being the case, the expected eaihie of d’a(n) is the product of the ex- 


pected values of qd. and a(n). Hence, 


dta(n) = dt a(n) = 8b a(n). Then using b = Ca* and equation 


18 


(6), 


i 1 


dia(n) =6 (atte { ( 1 = wn see Be * yp + Be 


a | 
where a = a(l), 


il 1 


=6 2 CED -@ite Aco wie + Ciencia ames 


= 8 (ax) ca -& cary tex I -8C ee neat -a ) (14) 


Next consider the expected value of aT (n)c a(n). 


d d 


xu a;(@M)x xj (n)x,(n)a,(n) 
iO j=0 Jo 


al (n)C,a(n) 


aj (n)aj(n) x; (n) x(n) (15) 


since a(n) and X(n) satisfy the hypothesis of the independence 


theorem above. 





ark 4 


The goal now is to express this expected value as a*(n)G(C), where G 

is some unknown function of C. Equation (15) is close, but it contains 
terms in aj (n)a;(n) with i#j, which have not yielded to analysis. Is it 
possible that these cross product terms vanish? Or is it possible that 
their coefficients x; (n)x;Cn) vanish forsixftj? The batters io, im faet, 
exactly the conclusion arrived at by assumption (2), the uniform input 
sequence. Under this assumption the derivation can proceed, and it will, 


justified by the urgent need for some results, however special. 





With C = r“I used in the expected value terms, equation (5) becomes 
fl ¢ 2.2 2 
a (n)C,, a(n) =¢ 2), rr a; (n) - ra Cn); 
i=0 


19 


and equation (14) becomes 


SE, = @r(at)* - 2p r7(a*) (ax - a)(1 -8 r2ynt, 


Finally returning to equation (13), 
a2(n+1) = 82r2(d41) + 28r2L 1-8 r2(a+1)) (a*)2 
~-Br? [1-8 r*(a+1) } (at) (a* - a) 1 -8 12)! 
Ma eo ean (16) 


The structure of equation (16) is more readily apparent if the 


following substitutions are made for the constants of the process. 





Let o aeer-Miecasly sole 1 8 r7(d+1) | (ary?) , 
y = -28r2f 1-8 r*(a+1)] (at) (at - a) , 
6 =(1-Br2)2 + g2rta, ana 
0 =(1 -8 oe 
“— el) = 0 +e" PH a2cn)y (17) 


which when reworked recursively one step becomes, 


-2 


: — 
: +6[ o +Yo" + 5a (n-1) J 


a2(n+1) = o+ Yo 
ne? jigs 
=o(1+6)+YCo0 +8 Do + 6a (n-1) , 
and one more step, 


ze 


——— “3 ———— 
ane) =0(1+5 +68) + ¥ (92 +96 +62)" oe Cn-2) . 


and through all the steps back to a(1), is 


20 


n=l , n=l ’ . 
a¢(nel) = 0S 6 ee ee ene (18) 
i=0 i=0 


The geometric series may be put into closed form, yielding 


imi. _ sn n=1 . : n n 
z 6t = iS , and z= Sabet el OTe 
1=0 i=0 0 ~§6 


Now replacing n by n-1, and making the above substitutions in equation 





(18), 
n-1 n-1 n-1 
a*(n) = of —~—$— ] + yf thee Ball + pot. 2, (19) 


The denominators in equation (19) can be rewritten as follows. 


1 6 B r@[ 2 -6 r2(d+1)] » and 


Ole 6 Ber cle x? Cae 


Now making all the substitutions forO ,Y, andO , and cancelling the 


new forms of the denominators, 





eee. 


a*(n) ={8(a41) + J 1-8 r°@+D] @H2}{ ——,——} 
2 -Br°(d+1) 


te a 


_ on oe ~a)f{ 1g pay) ~§ gn-1,2, (20) 


n-l - 
Collecting all the terms in 6 and (1 -8 r*)" : ’ 





a8) a 8 (d+1) + 2[ 1 -8 aaadk ]f ary? 


2 -B r*(d+1) 


a 2(a* - ay? + B (d+1)[ r?(2(ax)? - ae - d,s 


2 -B r7(d+1) 


~ 2(a*) (a ap Cl ae (21) 


Equation (10) provides a(n) under assumptions (1) and (2) which 


when squared yields 


21 


Z 


aM) = (a*)* — 2(a*) (ae = a)(1 -B 2%"! 


+ (ak - a)*[ 1 -8 r2y7 7! , (22) 


Combining equations (21) and (22), the variance of the gain vector is 
2 


a“(n) - a(n) 





vl a(n)] 


B(d+1) + 2[ 1 -8 r*(d+1)] (a*)’ 


9 - Br7(d+1) 


vp 26a* - ay + B(a+1)E res). — aoa —s 
2 - 8 r7(d+1) 


= (A*)* - (a*- a) 1 Gem) 70) Br) (23) 


Combining the constant terms leaves 


v[a(n)] = 8 (d+1) [1 - r7(a*)? 


2 -8 r2(d+1) 


wf 20a* = a)" +8 (ae) L 22 2a) - ata - Jy ml 
2 -Br*(d+1) 


aca* = aie Lael St! Dead , 


there 8 Stl eee ee ee (214) 


Now that the variance of the gain vector has been derived, the 
question is whether or not it converges, and if so, to what, and under 
what conditions? The bounds on 8 which imply convergence of the mean 

2 
are zero and ——-y (under both assumptions). Hence the bounds on 

1 
2, 2) n-1 
no) ] = 


Ci -Br2)2 are zero and one. Therefore limit [ (1 - O. On 


n co 


the other hand 6 is a quadratic expressionin8which is less than one for 


747 


2 


8 between zero and and therefore this lesser bound must be 





(d+1)r7 
observed in order for the term in 5 to vanish, and consequently insure 
the convergence of the variance of the gain vector. Hence, if both as- 


Sumptions (1) and (2) are valid, and if O0<B< a » then both 
(d+1)r2 


a(n) and VLa(n)] will converge. 
The mean will converge to a*, that a which minimizes Q(a), and the var- 
lance to 


2 D2 
V = lint aa = S(a+nl 1 - r*(ax)*] 


(25) 
nc 2 -8 r7(d+1) 


® 


Now recall the relationship from equation (11) between a* and Q(a*) 


which enables equation (25) to be written as 


* 
V= (d+) QCa*) » where the bounds on Q are zero and one 
2 -8 r*(d+1) 
and can be interpreted as a measure of the complexity of the environment 


to which the PCM is exposed. Since the derivative of V with respect to 


B is 


2(d+1)Q¢a*) » which 1S non negative, and since V vanishes 
F2 - 8 r(d+1)] 


for B = 0, the smaller®, the smaller V. 

Exactly how small V must be in order for the state of the PCM to be 
minimized is an open question. Intuitively, it is felt that S(a) reaches 
its minimum before (in the input sequence) Q(a) is minimized, and the 
answer is probably also the answer to Theorem 1. Further work on this 
problem is indicated. Also, of course, is the need for generalizing this 
work to apply to any input sequence. Once these details are in order, it 
should be possible to obtain an analytic expression for the number (or 


average number) of trials to achieve a minimum for the state of the PCM. 


aS 


Other algorithms and configurations which may yield to analysis can be 
found in Nilsson. [1] 
In the appendix are the results of evaluating the variance of the 


gain vector for some representative values of the parameters. 
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APPENDIX I 
Assumptions: (1) The input components are /; 1. 


(2) The input sequence is uniform. 


(3) 0<8< _2 
d+l1 





B (d+1)Q(a*) 


Then V = _ ar) 


Z 
let 8 = , then 0< Z< 2 and v= _4Q__ where Q = Q(a*) 
ati 7-7 





Ql =a- - Y- 


Note that 0 < Q< 1 and observe Fig. 1. Figures 2 and 3 are 
graphs of the variance when Q = 0 and therefore V = O for all values of 
Le 

3 


Figures 4 and 5 are graphs of the variance when Q = Fila V depends 


on Z. 
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